119 research outputs found

    FrameDP: sensitive peptide detection on noisy matured sequences

    Get PDF
    Summary: Transcriptome sequencing represents a fundamental source of information for genome-wide studies and transcriptome analysis and will become increasingly important for expression analysis as new sequencing technologies takes over array technology. The identification of the protein-coding region in transcript sequences is a prerequisite for systematic amino acid-level analysis and more specifically for domain identification. In this article, we present FrameDP, a self-training integrative pipeline for predicting CDS in transcripts which can adapt itself to different levels of sequence qualities

    Soft Concurrent Constraint Programming

    Full text link
    Soft constraints extend classical constraints to represent multiple consistency levels, and thus provide a way to express preferences, fuzziness, and uncertainty. While there are many soft constraint solving formalisms, even distributed ones, by now there seems to be no concurrent programming framework where soft constraints can be handled. In this paper we show how the classical concurrent constraint (cc) programming framework can work with soft constraints, and we also propose an extension of cc languages which can use soft constraints to prune and direct the search for a solution. We believe that this new programming paradigm, called soft cc (scc), can be also very useful in many web-related scenarios. In fact, the language level allows web agents to express their interaction and negotiation protocols, and also to post their requests in terms of preferences, and the underlying soft constraint solver can find an agreement among the agents even if their requests are incompatible.Comment: 25 pages, 4 figures, submitted to the ACM Transactions on Computational Logic (TOCL), zipped file

    Modelling the conference paper assignment problem

    Get PDF
    In this paper we describe different constraints and models for the conference paper assignment problem. While the core problem is a simple flow problem, additional constraints often arise to tailor a solution to specific wishes, or to increase perceived fairness for reviewers and/or submissions. We show some results from actual conferences paper assignments, and also investigate scalability of the method for large-scale events

    Many-Valued Institutions for Constraint Specification

    Get PDF
    We advance a general technique for enriching logical systems with soft constraints, making them suitable for specifying complex software systems where parts are put together not just based on how they meet certain functional requirements but also on how they optimise certain constraints. This added expressive power is required, for example, for capturing quality attributes that need to be optimised or, more generally, for formalising what are usually called service-level agreements. More specifically, we show how institutions endowed with a graded semantic consequence can accommodate soft-constraint satisfaction problems. We illustrate our approach by showing how, in the context of service discovery, one can quantify the compatibility of two specifications and thus formalise the selection of the most promising provider of a required resource.Peer Reviewe

    Deciphering the genome structure and paleohistory of _Theobroma cacao_

    Get PDF
    We sequenced and assembled the genome of _Theobroma cacao_, an economically important tropical fruit tree crop that is the source of chocolate. The assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of them anchored on the 10 _T. cacao_ chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example flavonoid-related genes. It also provides a major source of candidate genes for _T. cacao_ disease resistance and quality improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten _T. cacao_ chromosomes were shaped from an ancestor through eleven chromosome fusions. The _T. cacao_ genome can be considered as a simple living relic of higher plant evolution

    Survey sequencing and radiation hybrid mapping to construct comparative maps.

    No full text
    In MURPHY WJ (ed.) Phylogenomics, Humana Press. (Methods in Molecular Biology, 422)International audienceRadiation hybrid (RH) mapping has become one of the most well-established techniques for economically and efficiently navigating genomes of interest. The success of the technique relies on random chromosome breakage of a target genome, which is then captured by recipient cells missing a preselected marker. Selection for hybrid cells that have DNA fragments bearing the marker of choice, plus a random set of DNA fragments from the initial irradiation, generates a set of cell lines that recapitulates the genome of the target organism several-fold. Markers or genes of interest are analyzed by PCR using DNA isolated from each cell line. Statistical tools are applied to determine both the linear order of markers on each chromosome, and the confidence of each placement. The resolution of the resulting map relies on many factors, most notably the degree of breakage from the initial radiation as well as the number of hybrid clones and mean retention value.A high-resolution RH map of a genome derived from low pass or survey sequencing (coverage from 1 to 2 times) can provide essentially the same comparative data on gene order that is derived from high-coverage (greater than x7) genome sequencing. When combined with fluorescence in situ hybridization, RH maps are complete and ordered blueprints for each chromosome. They give information about the relative order and spacing of genes and markers, and allow investigators to move between target and reference genomes, such as those of mouse or human, with ease although the approach is not limited to mammal genomes

    Linkage mapping bovine EST-based SNP

    Get PDF
    BACKGROUND: Existing linkage maps of the bovine genome primarily contain anonymous microsatellite markers. These maps have proved valuable for mapping quantitative trait loci (QTL) to broad regions of the genome, but more closely spaced markers are needed to fine-map QTL, and markers associated with genes and annotated sequence are needed to identify genes and sequence variation that may explain QTL. RESULTS: Bovine expressed sequence tag (EST) and bacterial artificial chromosome (BAC)sequence data were used to develop 918 single nucleotide polymorphism (SNP) markers to map genes on the bovine linkage map. DNA of sires from the MARC reference population was used to detect SNPs, and progeny and mates of heterozygous sires were genotyped. Chromosome assignments for 861 SNPs were determined by twopoint analysis, and positions for 735 SNPs were established by multipoint analyses. Linkage maps of bovine autosomes with these SNPs represent 4585 markers in 2475 positions spanning 3058 cM . Markers include 3612 microsatellites, 913 SNPs and 60 other markers. Mean separation between marker positions is 1.2 cM. New SNP markers appear in 511 positions, with mean separation of 4.7 cM. Multi-allelic markers, mostly microsatellites, had a mean (maximum) of 216 (366) informative meioses, and a mean 3-lod confidence interval of 3.6 cM Bi-allelic markers, including SNP and other marker types, had a mean (maximum) of 55 (191) informative meioses, and were placed within a mean 8.5 cM 3-lod confidence interval. Homologous human sequences were identified for 1159 markers, including 582 newly developed and mapped SNP. CONCLUSION: Addition of these EST- and BAC-based SNPs to the bovine linkage map not only increases marker density, but provides connections to gene-rich physical maps, including annotated human sequence. The map provides a resource for fine-mapping quantitative trait loci and identification of positional candidate genes, and can be integrated with other data to guide and refine assembly of bovine genome sequence. Even after the bovine genome is completely sequenced, the map will continue to be a useful tool to link observable phenotypes and animal genotypes to underlying genes and molecular mechanisms influencing economically important beef and dairy traits

    HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors.</p> <p>Results</p> <p>We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families.</p> <p>Conclusions</p> <p>HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at <url>http://www.cse.msu.edu/~zhangy72/hmmframe/</url> and at <url>https://sourceforge.net/projects/hmm-frame/</url>.</p

    MetWAMer: eukaryotic translation initiation site prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Translation initiation site (TIS) identification is an important aspect of the gene annotation process, requisite for the accurate delineation of protein sequences from transcript data. We have developed the MetWAMer package for TIS prediction in eukaryotic open reading frames of non-viral origin. MetWAMer can be used as a stand-alone, third-party tool for post-processing gene structure annotations generated by external computational programs and/or pipelines, or directly integrated into gene structure prediction software implementations.</p> <p>Results</p> <p>MetWAMer currently implements five distinct methods for TIS prediction, the most accurate of which is a routine that combines weighted, signal-based translation initiation site scores and the contrast in coding potential of sequences flanking TISs using a perceptron. Also, our program implements clustering capabilities through use of the <it>k</it>-medoids algorithm, thereby enabling cluster-specific TIS parameter utilization. In practice, our static weight array matrix-based indexing method for parameter set lookup can be used with good results in data sets exhibiting moderate levels of 5'-complete coverage.</p> <p>Conclusion</p> <p>We demonstrate that improvements in statistically-based models for TIS prediction can be achieved by taking the class of each potential start-methionine into account pending certain testing conditions, and that our perceptron-based model is suitable for the TIS identification task. MetWAMer represents a well-documented, extensible, and freely available software system that can be readily re-trained for differing target applications and/or extended with existing and novel TIS prediction methods, to support further research efforts in this area.</p
    corecore